Overview

Dataset statistics

Number of variables40
Number of observations160
Missing cells417
Missing cells (%)6.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory234.1 KiB
Average record size in memory1.5 KiB

Variable types

Categorical27
DateTime3
Numeric8
Unsupported2

Warnings

TP_NOT has constant value "2" Constant
ID_AGRAVO has constant value "B54" Constant
NU_ANO has constant value "2007" Constant
SG_UF has constant value "33" Constant
ID_RG_RESI has constant value "" Constant
ID_PAIS has constant value "1" Constant
ID_OCUPA_N has constant value "" Constant
CLASSI_FIN has constant value "" Constant
DEXAME has a high cardinality: 103 distinct values High cardinality
DTRATA has a high cardinality: 60 distinct values High cardinality
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SG_UF_NOT is highly correlated with ID_MUNICIPHigh correlation
ID_MUNICIP is highly correlated with SG_UF_NOTHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
COUFINF is highly correlated with ID_REGIONA and 6 other fieldsHigh correlation
PMM is highly correlated with ID_REGIONA and 4 other fieldsHigh correlation
CS_RACA is highly correlated with DTRATA and 1 other fieldsHigh correlation
RESULT is highly correlated with AT_SINTOMA and 6 other fieldsHigh correlation
AT_SINTOMA is highly correlated with RESULT and 3 other fieldsHigh correlation
ID_UNIDADE is highly correlated with AT_LAMINAHigh correlation
ID_REGIONA is highly correlated with COUFINF and 6 other fieldsHigh correlation
SG_UF_NOT is highly correlated with COUFINF and 6 other fieldsHigh correlation
SEM_NOT is highly correlated with DTRATA and 3 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 14 other fieldsHigh correlation
AT_LAMINA is highly correlated with RESULT and 6 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with CS_RACA and 2 other fieldsHigh correlation
ID_MUNICIP is highly correlated with COUFINF and 4 other fieldsHigh correlation
NU_IDADE_N is highly correlated with CS_ESCOL_N and 2 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 10 other fieldsHigh correlation
LOC_INF is highly correlated with PMM and 8 other fieldsHigh correlation
COPAISINF is highly correlated with SEM_NOT and 1 other fieldsHigh correlation
DSTRAESQUE is highly correlated with DTRATA and 2 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 5 other fieldsHigh correlation
CS_GESTANT is highly correlated with CS_SEXOHigh correlation
TRA_ESQUEM is highly correlated with RESULT and 2 other fieldsHigh correlation
SEM_PRI is highly correlated with COUFINF and 3 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with RESULT and 6 other fieldsHigh correlation
CS_SEXO is highly correlated with CS_GESTANTHigh correlation
PCRUZ is highly correlated with PMM and 6 other fieldsHigh correlation
ID_MN_RESI is highly correlated with PMM and 8 other fieldsHigh correlation
COUFINF is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
ID_REGIONA is highly correlated with COUFINF and 11 other fieldsHigh correlation
DTRATA is highly correlated with ID_OCUPA_N and 12 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_OCUPA_N and 7 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with COUFINF and 24 other fieldsHigh correlation
DSTRAESQUE is highly correlated with DTRATA and 8 other fieldsHigh correlation
ID_PAIS is highly correlated with COUFINF and 24 other fieldsHigh correlation
NU_ANO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_SEXO is highly correlated with ID_OCUPA_N and 8 other fieldsHigh correlation
LOC_INF is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
SG_UF is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_RACA is highly correlated with ID_OCUPA_N and 7 other fieldsHigh correlation
RESULT is highly correlated with DTRATA and 11 other fieldsHigh correlation
AT_SINTOMA is highly correlated with ID_OCUPA_N and 11 other fieldsHigh correlation
SG_UF_NOT is highly correlated with COUFINF and 11 other fieldsHigh correlation
TP_NOT is highly correlated with COUFINF and 24 other fieldsHigh correlation
AT_LAMINA is highly correlated with ID_OCUPA_N and 10 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 10 other fieldsHigh correlation
TPAUTOCTO is highly correlated with ID_OCUPA_N and 8 other fieldsHigh correlation
ID_AGRAVO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_GESTANT is highly correlated with ID_OCUPA_N and 8 other fieldsHigh correlation
ID_RG_RESI is highly correlated with COUFINF and 24 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with DTRATA and 9 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with ID_OCUPA_N and 9 other fieldsHigh correlation
CLASSI_FIN is highly correlated with COUFINF and 24 other fieldsHigh correlation
PCRUZ is highly correlated with DTRATA and 8 other fieldsHigh correlation
DT_NASC has 13 (8.1%) missing values Missing
DT_INVEST has 160 (100.0%) missing values Missing
PMM has 84 (52.5%) missing values Missing
DT_ENCERRA has 160 (100.0%) missing values Missing
DT_INVEST is an unsupported type, check if it needs cleaning or further analysis Unsupported
DT_ENCERRA is an unsupported type, check if it needs cleaning or further analysis Unsupported
COPAISINF has 43 (26.9%) zeros Zeros

Reproduction

Analysis started2021-07-06 18:37:23.513040
Analysis finished2021-07-06 18:37:45.058346
Duration21.55 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

TP_NOT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.2 KiB
2
160 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters160
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2160
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2160
100.0%

Most occurring characters

ValueCountFrequency (%)
2160
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2160
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2160
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2160
100.0%

ID_AGRAVO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size12.0 KiB
B54
160 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters480
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB54
2nd rowB54
3rd rowB54
4th rowB54
5th rowB54

Common Values

ValueCountFrequency (%)
B54160
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
b54160
100.0%

Most occurring characters

ValueCountFrequency (%)
B160
33.3%
5160
33.3%
4160
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number320
66.7%
Uppercase Letter160
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5160
50.0%
4160
50.0%
Uppercase Letter
ValueCountFrequency (%)
B160
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common320
66.7%
Latin160
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5160
50.0%
4160
50.0%
Latin
ValueCountFrequency (%)
B160
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII480
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B160
33.3%
5160
33.3%
4160
33.3%
Distinct109
Distinct (%)68.1%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
Minimum2007-01-03 00:00:00
Maximum2007-12-31 00:00:00
Histogram with fixed size bins (bins=50)

SEM_NOT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct47
Distinct (%)29.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean200727.2938
Minimum200701
Maximum200801
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum200701
5-th percentile200703
Q1200714.75
median200726
Q3200741
95-th percentile200752
Maximum200801
Range100
Interquartile range (IQR)26.25

Descriptive statistics

Standard deviation16.45294295
Coefficient of variation (CV)8.196664561 × 10-5
Kurtosis0.9620209131
Mean200727.2938
Median Absolute Deviation (MAD)13.5
Skewness0.5734813807
Sum32116367
Variance270.6993318
MonotonicityNot monotonic
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
2007529
 
5.6%
2007187
 
4.4%
2007416
 
3.8%
2007196
 
3.8%
2007376
 
3.8%
2007035
 
3.1%
2007285
 
3.1%
2007155
 
3.1%
2007495
 
3.1%
2007134
 
2.5%
Other values (37)102
63.7%
ValueCountFrequency (%)
2007014
2.5%
2007035
3.1%
2007042
 
1.2%
2007052
 
1.2%
2007062
 
1.2%
2007074
2.5%
2007084
2.5%
2007091
 
0.6%
2007104
2.5%
2007112
 
1.2%
ValueCountFrequency (%)
2008011
 
0.6%
2007529
5.6%
2007504
2.5%
2007495
3.1%
2007484
2.5%
2007471
 
0.6%
2007463
 
1.9%
2007453
 
1.9%
2007433
 
1.9%
2007423
 
1.9%

NU_ANO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.7 KiB
2007
160 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters640
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2007
2nd row2007
3rd row2007
4th row2007
5th row2007

Common Values

ValueCountFrequency (%)
2007160
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2007160
100.0%

Most occurring characters

ValueCountFrequency (%)
0320
50.0%
2160
25.0%
7160
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number640
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0320
50.0%
2160
25.0%
7160
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common640
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0320
50.0%
2160
25.0%
7160
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII640
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0320
50.0%
2160
25.0%
7160
25.0%

SG_UF_NOT
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size9.3 KiB
33
158 
31
 
2

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters320
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33158
98.8%
312
 
1.2%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33158
98.8%
312
 
1.2%

Most occurring characters

ValueCountFrequency (%)
3318
99.4%
12
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number320
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3318
99.4%
12
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common320
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3318
99.4%
12
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII320
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3318
99.4%
12
 
0.6%

ID_MUNICIP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct15
Distinct (%)9.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330200.2812
Minimum312770
Maximum330455
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum312770
5-th percentile330079.5
Q1330455
median330455
Q3330455
95-th percentile330455
Maximum330455
Range17685
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1919.754738
Coefficient of variation (CV)0.005813910063
Kurtosis77.2297767
Mean330200.2812
Median Absolute Deviation (MAD)0
Skewness-8.83102298
Sum52832045
Variance3685458.254
MonotonicityNot monotonic
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
330455131
81.9%
3302406
 
3.8%
3303304
 
2.5%
3302203
 
1.9%
3300103
 
1.9%
3302002
 
1.2%
3303602
 
1.2%
3300702
 
1.2%
3127701
 
0.6%
3304201
 
0.6%
Other values (5)5
 
3.1%
ValueCountFrequency (%)
3127701
 
0.6%
3136701
 
0.6%
3300103
1.9%
3300401
 
0.6%
3300702
 
1.2%
3300801
 
0.6%
3302002
 
1.2%
3302203
1.9%
3302406
3.8%
3303304
2.5%
ValueCountFrequency (%)
330455131
81.9%
3304201
 
0.6%
3304101
 
0.6%
3303602
 
1.2%
3303401
 
0.6%
3303304
 
2.5%
3302406
 
3.8%
3302203
 
1.9%
3302002
 
1.2%
3300801
 
0.6%

ID_REGIONA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size9.7 KiB
158 
1471
 
1
1452
 
1

Length

Max length4
Median length0
Mean length0.05
Min length0

Characters and Unicode

Total characters8
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.2%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
158
98.8%
14711
 
0.6%
14521
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
14711
50.0%
14521
50.0%

Most occurring characters

ValueCountFrequency (%)
13
37.5%
42
25.0%
71
 
12.5%
51
 
12.5%
21
 
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
13
37.5%
42
25.0%
71
 
12.5%
51
 
12.5%
21
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
Common8
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
13
37.5%
42
25.0%
71
 
12.5%
51
 
12.5%
21
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII8
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13
37.5%
42
25.0%
71
 
12.5%
51
 
12.5%
21
 
12.5%

ID_UNIDADE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct50
Distinct (%)31.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2362906.263
Minimum63
Maximum5237033
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum63
5-th percentile2269771.55
Q12269805
median2279207
Q32288338
95-th percentile3195138.55
Maximum5237033
Range5236970
Interquartile range (IQR)18533

Descriptive statistics

Standard deviation615347.6219
Coefficient of variation (CV)0.2604198193
Kurtosis10.6302858
Mean2362906.263
Median Absolute Deviation (MAD)9402
Skewness-0.1477729695
Sum378065002
Variance3.786526958 × 1011
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
226980550
31.2%
228016724
15.0%
228833810
 
6.2%
22765346
 
3.8%
22978336
 
3.8%
30059924
 
2.5%
22807954
 
2.5%
633
 
1.9%
33338683
 
1.9%
22706093
 
1.9%
Other values (40)47
29.4%
ValueCountFrequency (%)
633
 
1.9%
761
 
0.6%
125051
 
0.6%
22199481
 
0.6%
22692951
 
0.6%
22695541
 
0.6%
22697832
 
1.2%
226980550
31.2%
22699882
 
1.2%
22702501
 
0.6%
ValueCountFrequency (%)
52370331
 
0.6%
51580441
 
0.6%
38103482
1.2%
33754711
 
0.6%
33338683
1.9%
31878371
 
0.6%
31850951
 
0.6%
30656342
1.2%
30602091
 
0.6%
30462811
 
0.6%
Distinct119
Distinct (%)74.4%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
Minimum2006-12-18 00:00:00
Maximum2007-12-29 00:00:00
Histogram with fixed size bins (bins=50)

SEM_PRI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct52
Distinct (%)32.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean200725.1063
Minimum200651
Maximum200752
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum200651
5-th percentile200703
Q1200712
median200725
Q3200739
95-th percentile200751
Maximum200752
Range101
Interquartile range (IQR)27

Descriptive statistics

Standard deviation18.29582104
Coefficient of variation (CV)9.114864296 × 10-5
Kurtosis3.039875949
Mean200725.1063
Median Absolute Deviation (MAD)14
Skewness-1.040930211
Sum32116017
Variance334.7370676
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2007107
 
4.4%
2007517
 
4.4%
2007187
 
4.4%
2007377
 
4.4%
2007415
 
3.1%
2007165
 
3.1%
2007495
 
3.1%
2007195
 
3.1%
2007395
 
3.1%
2007034
 
2.5%
Other values (42)103
64.4%
ValueCountFrequency (%)
2006512
1.2%
2006521
 
0.6%
2007013
1.9%
2007021
 
0.6%
2007034
2.5%
2007043
1.9%
2007052
1.2%
2007064
2.5%
2007073
1.9%
2007081
 
0.6%
ValueCountFrequency (%)
2007523
1.9%
2007517
4.4%
2007502
 
1.2%
2007495
3.1%
2007484
2.5%
2007472
 
1.2%
2007462
 
1.2%
2007452
 
1.2%
2007441
 
0.6%
2007431
 
0.6%

DT_NASC
Date

MISSING

Distinct135
Distinct (%)91.8%
Missing13
Missing (%)8.1%
Memory size1.4 KiB
Minimum1933-01-28 00:00:00
Maximum2003-03-25 00:00:00
Histogram with fixed size bins (bins=50)

NU_IDADE_N
Real number (ℝ≥0)

HIGH CORRELATION

Distinct58
Distinct (%)36.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4038.60625
Minimum4003
Maximum4078
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum4003
5-th percentile4016.95
Q14026.75
median4039
Q34049
95-th percentile4062.05
Maximum4078
Range75
Interquartile range (IQR)22.25

Descriptive statistics

Standard deviation15.02036223
Coefficient of variation (CV)0.003719194519
Kurtosis-0.4977281891
Mean4038.60625
Median Absolute Deviation (MAD)11
Skewness0.03995803105
Sum646177
Variance225.6112814
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40467
 
4.4%
40497
 
4.4%
40405
 
3.1%
40475
 
3.1%
40275
 
3.1%
40505
 
3.1%
40545
 
3.1%
40394
 
2.5%
40374
 
2.5%
40484
 
2.5%
Other values (48)109
68.1%
ValueCountFrequency (%)
40031
 
0.6%
40051
 
0.6%
40061
 
0.6%
40101
 
0.6%
40142
1.2%
40162
1.2%
40174
2.5%
40181
 
0.6%
40194
2.5%
40203
1.9%
ValueCountFrequency (%)
40781
 
0.6%
40741
 
0.6%
40731
 
0.6%
40661
 
0.6%
40652
1.2%
40641
 
0.6%
40631
 
0.6%
40622
1.2%
40601
 
0.6%
40593
1.9%

CS_SEXO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
M
122 
F
38 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters160
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowF
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M122
76.2%
F38
 
23.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m122
76.2%
f38
 
23.8%

Most occurring characters

ValueCountFrequency (%)
M122
76.2%
F38
 
23.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter160
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M122
76.2%
F38
 
23.8%

Most occurring scripts

ValueCountFrequency (%)
Latin160
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M122
76.2%
F38
 
23.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M122
76.2%
F38
 
23.8%

CS_GESTANT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size9.2 KiB
6
125 
9
26 
5
 
8
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters160
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row6
2nd row6
3rd row9
4th row6
5th row6

Common Values

ValueCountFrequency (%)
6125
78.1%
926
 
16.2%
58
 
5.0%
21
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
6125
78.1%
926
 
16.2%
58
 
5.0%
21
 
0.6%

Most occurring characters

ValueCountFrequency (%)
6125
78.1%
926
 
16.2%
58
 
5.0%
21
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6125
78.1%
926
 
16.2%
58
 
5.0%
21
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6125
78.1%
926
 
16.2%
58
 
5.0%
21
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6125
78.1%
926
 
16.2%
58
 
5.0%
21
 
0.6%

CS_RACA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
1
106 
4
21 
18 
2
13 
5
 
2

Length

Max length1
Median length1
Mean length0.8875
Min length0

Characters and Unicode

Total characters142
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row4
3rd row1
4th row1
5th row

Common Values

ValueCountFrequency (%)
1106
66.2%
421
 
13.1%
18
 
11.2%
213
 
8.1%
52
 
1.2%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1106
74.6%
421
 
14.8%
213
 
9.2%
52
 
1.4%

Most occurring characters

ValueCountFrequency (%)
1106
74.6%
421
 
14.8%
213
 
9.2%
52
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number142
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1106
74.6%
421
 
14.8%
213
 
9.2%
52
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Common142
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1106
74.6%
421
 
14.8%
213
 
9.2%
52
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII142
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1106
74.6%
421
 
14.8%
213
 
9.2%
52
 
1.4%

CS_ESCOL_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Memory size9.4 KiB
05
55 
28 
04
23 
09
13 
08
13 
Other values (6)
28 

Length

Max length2
Median length2
Mean length1.65
Min length0

Characters and Unicode

Total characters264
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row10
2nd row10
3rd row04
4th row05
5th row

Common Values

ValueCountFrequency (%)
0555
34.4%
28
17.5%
0423
14.4%
0913
 
8.1%
0813
 
8.1%
068
 
5.0%
027
 
4.4%
035
 
3.1%
074
 
2.5%
103
 
1.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
0555
41.7%
0423
17.4%
0913
 
9.8%
0813
 
9.8%
068
 
6.1%
027
 
5.3%
035
 
3.8%
074
 
3.0%
103
 
2.3%
011
 
0.8%

Most occurring characters

ValueCountFrequency (%)
0132
50.0%
555
20.8%
423
 
8.7%
913
 
4.9%
813
 
4.9%
68
 
3.0%
27
 
2.7%
35
 
1.9%
14
 
1.5%
74
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number264
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0132
50.0%
555
20.8%
423
 
8.7%
913
 
4.9%
813
 
4.9%
68
 
3.0%
27
 
2.7%
35
 
1.9%
14
 
1.5%
74
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
Common264
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0132
50.0%
555
20.8%
423
 
8.7%
913
 
4.9%
813
 
4.9%
68
 
3.0%
27
 
2.7%
35
 
1.9%
14
 
1.5%
74
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII264
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0132
50.0%
555
20.8%
423
 
8.7%
913
 
4.9%
813
 
4.9%
68
 
3.0%
27
 
2.7%
35
 
1.9%
14
 
1.5%
74
 
1.5%

SG_UF
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.3 KiB
33
160 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters320
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33160
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33160
100.0%

Most occurring characters

ValueCountFrequency (%)
3320
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number320
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3320
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common320
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3320
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII320
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3320
100.0%

ID_MN_RESI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct24
Distinct (%)15.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330408.4562
Minimum330010
Maximum330610
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum330010
5-th percentile330080
Q1330417.5
median330455
Q3330455
95-th percentile330513.5
Maximum330610
Range600
Interquartile range (IQR)37.5

Descriptive statistics

Standard deviation120.3868945
Coefficient of variation (CV)0.0003643577887
Kurtosis3.05992688
Mean330408.4562
Median Absolute Deviation (MAD)0
Skewness-1.755856728
Sum52865353
Variance14493.00436
MonotonicityNot monotonic
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
330455107
66.9%
3303307
 
4.4%
3302406
 
3.8%
3306104
 
2.5%
3305804
 
2.5%
3300103
 
1.9%
3303403
 
1.9%
3303503
 
1.9%
3301703
 
1.9%
3304102
 
1.2%
Other values (14)18
 
11.2%
ValueCountFrequency (%)
3300103
1.9%
3300231
 
0.6%
3300401
 
0.6%
3300702
 
1.2%
3300802
 
1.2%
3301703
1.9%
3302001
 
0.6%
3302051
 
0.6%
3302201
 
0.6%
3302406
3.8%
ValueCountFrequency (%)
3306104
 
2.5%
3305804
 
2.5%
3305102
 
1.2%
3304901
 
0.6%
3304601
 
0.6%
330455107
66.9%
3304201
 
0.6%
3304102
 
1.2%
3303701
 
0.6%
3303602
 
1.2%

ID_RG_RESI
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.7 KiB
160 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
160
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_PAIS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.2 KiB
1
160 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters160
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1160
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1160
100.0%

Most occurring characters

ValueCountFrequency (%)
1160
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number160
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1160
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1160
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1160
100.0%

DT_INVEST
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing160
Missing (%)100.0%
Memory size1.4 KiB

ID_OCUPA_N
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.7 KiB
160 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
160
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CLASSI_FIN
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size9.7 KiB
160 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
160
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

AT_ATIVIDA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Memory size9.5 KiB
11
97 
10
25 
13 
99
 
8
4
 
5
Other values (6)
12 

Length

Max length2
Median length2
Mean length1.73125
Min length0

Characters and Unicode

Total characters277
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.2%

Sample

1st row10
2nd row11
3rd row
4th row11
5th row2

Common Values

ValueCountFrequency (%)
1197
60.6%
1025
 
15.6%
13
 
8.1%
998
 
5.0%
45
 
3.1%
23
 
1.9%
33
 
1.9%
12
 
1.2%
92
 
1.2%
81
 
0.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1197
66.0%
1025
 
17.0%
998
 
5.4%
45
 
3.4%
23
 
2.0%
33
 
2.0%
12
 
1.4%
92
 
1.4%
81
 
0.7%
71
 
0.7%

Most occurring characters

ValueCountFrequency (%)
1221
79.8%
025
 
9.0%
918
 
6.5%
45
 
1.8%
23
 
1.1%
33
 
1.1%
81
 
0.4%
71
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number277
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1221
79.8%
025
 
9.0%
918
 
6.5%
45
 
1.8%
23
 
1.1%
33
 
1.1%
81
 
0.4%
71
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common277
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1221
79.8%
025
 
9.0%
918
 
6.5%
45
 
1.8%
23
 
1.1%
33
 
1.1%
81
 
0.4%
71
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII277
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1221
79.8%
025
 
9.0%
918
 
6.5%
45
 
1.8%
23
 
1.1%
33
 
1.1%
81
 
0.4%
71
 
0.4%

AT_LAMINA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
2
122 
1
21 
13 
3
 
4

Length

Max length1
Median length1
Mean length0.91875
Min length0

Characters and Unicode

Total characters147
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row
4th row2
5th row3

Common Values

ValueCountFrequency (%)
2122
76.2%
121
 
13.1%
13
 
8.1%
34
 
2.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2122
83.0%
121
 
14.3%
34
 
2.7%

Most occurring characters

ValueCountFrequency (%)
2122
83.0%
121
 
14.3%
34
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2122
83.0%
121
 
14.3%
34
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
Common147
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2122
83.0%
121
 
14.3%
34
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII147
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2122
83.0%
121
 
14.3%
34
 
2.7%

AT_SINTOMA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
1
147 
 
13

Length

Max length1
Median length1
Mean length0.91875
Min length0

Characters and Unicode

Total characters147
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1147
91.9%
13
 
8.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1147
100.0%

Most occurring characters

ValueCountFrequency (%)
1147
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1147
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common147
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1147
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII147
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1147
100.0%

TPAUTOCTO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
2
114 
3
30 
13 
1
 
3

Length

Max length1
Median length1
Mean length0.91875
Min length0

Characters and Unicode

Total characters147
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row3
3rd row2
4th row3
5th row2

Common Values

ValueCountFrequency (%)
2114
71.2%
330
 
18.8%
13
 
8.1%
13
 
1.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2114
77.6%
330
 
20.4%
13
 
2.0%

Most occurring characters

ValueCountFrequency (%)
2114
77.6%
330
 
20.4%
13
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2114
77.6%
330
 
20.4%
13
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Common147
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2114
77.6%
330
 
20.4%
13
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII147
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2114
77.6%
330
 
20.4%
13
 
2.0%

COUFINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size10.2 KiB
103 
AM
27 
PA
 
10
RO
 
8
RJ
 
8
Other values (4)
 
4

Length

Max length2
Median length0
Mean length0.7125
Min length0

Characters and Unicode

Total characters114
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)2.5%

Sample

1st row
2nd row
3rd row
4th row
5th rowMA

Common Values

ValueCountFrequency (%)
103
64.4%
AM27
 
16.9%
PA10
 
6.2%
RO8
 
5.0%
RJ8
 
5.0%
CE1
 
0.6%
MT1
 
0.6%
AC1
 
0.6%
MA1
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
am27
47.4%
pa10
 
17.5%
ro8
 
14.0%
rj8
 
14.0%
ma1
 
1.8%
ac1
 
1.8%
mt1
 
1.8%
ce1
 
1.8%

Most occurring characters

ValueCountFrequency (%)
A39
34.2%
M29
25.4%
R16
14.0%
P10
 
8.8%
O8
 
7.0%
J8
 
7.0%
C2
 
1.8%
E1
 
0.9%
T1
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter114
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A39
34.2%
M29
25.4%
R16
14.0%
P10
 
8.8%
O8
 
7.0%
J8
 
7.0%
C2
 
1.8%
E1
 
0.9%
T1
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Latin114
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A39
34.2%
M29
25.4%
R16
14.0%
P10
 
8.8%
O8
 
7.0%
J8
 
7.0%
C2
 
1.8%
E1
 
0.9%
T1
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII114
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A39
34.2%
M29
25.4%
R16
14.0%
P10
 
8.8%
O8
 
7.0%
J8
 
7.0%
C2
 
1.8%
E1
 
0.9%
T1
 
0.9%

COPAISINF
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct16
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.125
Minimum0
Maximum199
Zeros43
Zeros (%)26.9%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q331
95-th percentile140
Maximum199
Range199
Interquartile range (IQR)31

Descriptive statistics

Standard deviation45.70437547
Coefficient of variation (CV)1.894481885
Kurtosis5.046061334
Mean24.125
Median Absolute Deviation (MAD)1
Skewness2.402921521
Sum3860
Variance2088.889937
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
157
35.6%
043
26.9%
3133
20.6%
225
 
3.1%
1404
 
2.5%
1903
 
1.9%
1522
 
1.2%
1142
 
1.2%
1112
 
1.2%
682
 
1.2%
Other values (6)7
 
4.4%
ValueCountFrequency (%)
043
26.9%
157
35.6%
22
 
1.2%
225
 
3.1%
3133
20.6%
451
 
0.6%
682
 
1.2%
1112
 
1.2%
1121
 
0.6%
1131
 
0.6%
ValueCountFrequency (%)
1991
 
0.6%
1903
1.9%
1771
 
0.6%
1522
1.2%
1404
2.5%
1142
1.2%
1131
 
0.6%
1121
 
0.6%
1112
1.2%
682
1.2%

COMUNINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct20
Distinct (%)12.5%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
103 
130260
24 
330340
 
4
110020
 
4
110012
 
3
Other values (15)
22 

Length

Max length6
Median length0
Mean length2.1375
Min length0

Characters and Unicode

Total characters342
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)6.2%

Sample

1st row
2nd row
3rd row
4th row
5th row210735

Common Values

ValueCountFrequency (%)
103
64.4%
13026024
 
15.0%
3303404
 
2.5%
1100204
 
2.5%
1100123
 
1.9%
1507303
 
1.9%
1501403
 
1.9%
3305502
 
1.2%
1508152
 
1.2%
1303562
 
1.2%
Other values (10)10
 
6.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
13026024
42.1%
3303404
 
7.0%
1100204
 
7.0%
1100123
 
5.3%
1507303
 
5.3%
1501403
 
5.3%
3305502
 
3.5%
1508152
 
3.5%
1303562
 
3.5%
1100301
 
1.8%
Other values (9)9
 
15.8%

Most occurring characters

ValueCountFrequency (%)
0113
33.0%
165
19.0%
358
17.0%
235
 
10.2%
629
 
8.5%
522
 
6.4%
48
 
2.3%
76
 
1.8%
85
 
1.5%
91
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number342
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0113
33.0%
165
19.0%
358
17.0%
235
 
10.2%
629
 
8.5%
522
 
6.4%
48
 
2.3%
76
 
1.8%
85
 
1.5%
91
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common342
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0113
33.0%
165
19.0%
358
17.0%
235
 
10.2%
629
 
8.5%
522
 
6.4%
48
 
2.3%
76
 
1.8%
85
 
1.5%
91
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII342
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0113
33.0%
165
19.0%
358
17.0%
235
 
10.2%
629
 
8.5%
522
 
6.4%
48
 
2.3%
76
 
1.8%
85
 
1.5%
91
 
0.3%

LOC_INF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
149 
MANA
 
2
JI-P
 
2
SERI
 
1
PARA
 
1
Other values (5)
 
5

Length

Max length4
Median length0
Mean length0.2625
Min length0

Characters and Unicode

Total characters42
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)4.4%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
149
93.1%
MANA2
 
1.2%
JI-P2
 
1.2%
SERI1
 
0.6%
PARA1
 
0.6%
JI P1
 
0.6%
RIO1
 
0.6%
SAO1
 
0.6%
BELE1
 
0.6%
VIST1
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
mana2
16.7%
ji-p2
16.7%
vist1
8.3%
ji1
8.3%
seri1
8.3%
p1
8.3%
rio1
8.3%
sao1
8.3%
bele1
8.3%
para1
8.3%

Most occurring characters

ValueCountFrequency (%)
A7
16.7%
I6
14.3%
P4
9.5%
E3
7.1%
S3
7.1%
J3
7.1%
R3
7.1%
-2
 
4.8%
M2
 
4.8%
N2
 
4.8%
Other values (6)7
16.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter39
92.9%
Dash Punctuation2
 
4.8%
Space Separator1
 
2.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A7
17.9%
I6
15.4%
P4
10.3%
E3
7.7%
S3
7.7%
J3
7.7%
R3
7.7%
M2
 
5.1%
N2
 
5.1%
O2
 
5.1%
Other values (4)4
10.3%
Dash Punctuation
ValueCountFrequency (%)
-2
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin39
92.9%
Common3
 
7.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A7
17.9%
I6
15.4%
P4
10.3%
E3
7.7%
S3
7.7%
J3
7.7%
R3
7.7%
M2
 
5.1%
N2
 
5.1%
O2
 
5.1%
Other values (4)4
10.3%
Common
ValueCountFrequency (%)
-2
66.7%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII42
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A7
16.7%
I6
14.3%
P4
9.5%
E3
7.1%
S3
7.1%
J3
7.1%
R3
7.1%
-2
 
4.8%
M2
 
4.8%
N2
 
4.8%
Other values (6)7
16.7%

DEXAME
Categorical

HIGH CARDINALITY

Distinct103
Distinct (%)64.4%
Missing0
Missing (%)0.0%
Memory size10.5 KiB
None
13 
2007-12-27
 
5
2007-05-03
 
5
2007-10-10
 
5
2007-07-10
 
4
Other values (98)
128 

Length

Max length10
Median length10
Mean length9.5125
Min length4

Characters and Unicode

Total characters1522
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique75 ?
Unique (%)46.9%

Sample

1st row2007-01-23
2nd row2007-01-25
3rd rowNone
4th row2007-01-15
5th row2007-01-03

Common Values

ValueCountFrequency (%)
None13
 
8.1%
2007-12-275
 
3.1%
2007-05-035
 
3.1%
2007-10-105
 
3.1%
2007-07-104
 
2.5%
2007-09-133
 
1.9%
2007-02-223
 
1.9%
2007-09-103
 
1.9%
2007-04-273
 
1.9%
2007-04-103
 
1.9%
Other values (93)113
70.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none13
 
8.1%
2007-12-275
 
3.1%
2007-05-035
 
3.1%
2007-10-105
 
3.1%
2007-07-104
 
2.5%
2007-04-273
 
1.9%
2007-05-113
 
1.9%
2007-04-103
 
1.9%
2007-09-103
 
1.9%
2007-02-223
 
1.9%
Other values (93)113
70.6%

Most occurring characters

ValueCountFrequency (%)
0477
31.3%
-294
19.3%
2233
15.3%
7177
 
11.6%
1134
 
8.8%
334
 
2.2%
531
 
2.0%
926
 
1.7%
622
 
1.4%
822
 
1.4%
Other values (5)72
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1176
77.3%
Dash Punctuation294
 
19.3%
Lowercase Letter39
 
2.6%
Uppercase Letter13
 
0.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0477
40.6%
2233
19.8%
7177
 
15.1%
1134
 
11.4%
334
 
2.9%
531
 
2.6%
926
 
2.2%
622
 
1.9%
822
 
1.9%
420
 
1.7%
Lowercase Letter
ValueCountFrequency (%)
o13
33.3%
n13
33.3%
e13
33.3%
Dash Punctuation
ValueCountFrequency (%)
-294
100.0%
Uppercase Letter
ValueCountFrequency (%)
N13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1470
96.6%
Latin52
 
3.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0477
32.4%
-294
20.0%
2233
15.9%
7177
 
12.0%
1134
 
9.1%
334
 
2.3%
531
 
2.1%
926
 
1.8%
622
 
1.5%
822
 
1.5%
Latin
ValueCountFrequency (%)
N13
25.0%
o13
25.0%
n13
25.0%
e13
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1522
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0477
31.3%
-294
19.3%
2233
15.3%
7177
 
11.6%
1134
 
8.8%
334
 
2.2%
531
 
2.0%
926
 
1.7%
622
 
1.4%
822
 
1.4%
Other values (5)72
 
4.7%

RESULT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
1
71 
4
38 
2
36 
13 
5
 
1

Length

Max length1
Median length1
Mean length0.91875
Min length0

Characters and Unicode

Total characters147
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.2%

Sample

1st row1
2nd row1
3rd row
4th row1
5th row4

Common Values

ValueCountFrequency (%)
171
44.4%
438
23.8%
236
22.5%
13
 
8.1%
51
 
0.6%
81
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
171
48.3%
438
25.9%
236
24.5%
51
 
0.7%
81
 
0.7%

Most occurring characters

ValueCountFrequency (%)
171
48.3%
438
25.9%
236
24.5%
51
 
0.7%
81
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number147
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
171
48.3%
438
25.9%
236
24.5%
51
 
0.7%
81
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common147
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
171
48.3%
438
25.9%
236
24.5%
51
 
0.7%
81
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII147
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
171
48.3%
438
25.9%
236
24.5%
51
 
0.7%
81
 
0.7%

PMM
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct45
Distinct (%)59.2%
Missing84
Missing (%)52.5%
Infinite0
Infinite (%)0.0%
Mean1321628.829
Minimum1
Maximum99999999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum1
5-th percentile5
Q1300
median380
Q3680
95-th percentile45460
Maximum99999999
Range99999998
Interquartile range (IQR)380

Descriptive statistics

Standard deviation11470124.99
Coefficient of variation (CV)8.678779354
Kurtosis75.99952652
Mean1321628.829
Median Absolute Deviation (MAD)150
Skewness8.717757706
Sum100443791
Variance1.315637674 × 1014
MonotonicityNot monotonic
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
3807
 
4.4%
5016
 
3.8%
3105
 
3.1%
2204
 
2.5%
3014
 
2.5%
1000003
 
1.9%
6802
 
1.2%
3502
 
1.2%
52
 
1.2%
1002
 
1.2%
Other values (35)39
24.4%
(Missing)84
52.5%
ValueCountFrequency (%)
11
0.6%
21
0.6%
31
0.6%
52
1.2%
101
0.6%
1002
1.2%
1701
0.6%
2001
0.6%
2101
0.6%
2151
0.6%
ValueCountFrequency (%)
999999991
 
0.6%
1000003
1.9%
272801
 
0.6%
250001
 
0.6%
150002
1.2%
100101
 
0.6%
100012
1.2%
48801
 
0.6%
28801
 
0.6%
15201
 
0.6%

PCRUZ
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size10.0 KiB
84 
3
27 
4
17 
2
13 
1
 
8
Other values (2)
11 

Length

Max length1
Median length0
Mean length0.475
Min length0

Characters and Unicode

Total characters76
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row2

Common Values

ValueCountFrequency (%)
84
52.5%
327
 
16.9%
417
 
10.6%
213
 
8.1%
18
 
5.0%
58
 
5.0%
63
 
1.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
327
35.5%
417
22.4%
213
17.1%
18
 
10.5%
58
 
10.5%
63
 
3.9%

Most occurring characters

ValueCountFrequency (%)
327
35.5%
417
22.4%
213
17.1%
58
 
10.5%
18
 
10.5%
63
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number76
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
327
35.5%
417
22.4%
213
17.1%
58
 
10.5%
18
 
10.5%
63
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
Common76
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
327
35.5%
417
22.4%
213
17.1%
58
 
10.5%
18
 
10.5%
63
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII76
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
327
35.5%
417
22.4%
213
17.1%
58
 
10.5%
18
 
10.5%
63
 
3.9%

TRA_ESQUEM
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size10.1 KiB
84 
1
35 
6
25 
99
13 
7
 
1
Other values (2)
 
2

Length

Max length2
Median length0
Mean length0.5625
Min length0

Characters and Unicode

Total characters90
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.9%

Sample

1st row
2nd row
3rd row
4th row
5th row1

Common Values

ValueCountFrequency (%)
84
52.5%
135
21.9%
625
 
15.6%
9913
 
8.1%
71
 
0.6%
101
 
0.6%
31
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
135
46.1%
625
32.9%
9913
 
17.1%
101
 
1.3%
31
 
1.3%
71
 
1.3%

Most occurring characters

ValueCountFrequency (%)
136
40.0%
926
28.9%
625
27.8%
31
 
1.1%
01
 
1.1%
71
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number90
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
136
40.0%
926
28.9%
625
27.8%
31
 
1.1%
01
 
1.1%
71
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common90
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
136
40.0%
926
28.9%
625
27.8%
31
 
1.1%
01
 
1.1%
71
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII90
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
136
40.0%
926
28.9%
625
27.8%
31
 
1.1%
01
 
1.1%
71
 
1.1%

DSTRAESQUE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
151 
ARTESUNATO INJETAVEL E CLINDAM
 
1
INFECCOES POR PV COM CLOROQUIN
 
1
ARTEMETE
 
1
ARTENETOR
 
1
Other values (5)
 
5

Length

Max length30
Median length0
Mean length0.99375
Min length0

Characters and Unicode

Total characters159
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)5.6%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
151
94.4%
ARTESUNATO INJETAVEL E CLINDAM1
 
0.6%
INFECCOES POR PV COM CLOROQUIN1
 
0.6%
ARTEMETE1
 
0.6%
ARTENETOR1
 
0.6%
ARTEMETERIM+DOXICICLINA1
 
0.6%
MEFLOQUINA1
 
0.6%
ARTSUNATO INJETAVEL1
 
0.6%
ARTESUNATO INJETAVEL1
 
0.6%
ARTEMETHER1
 
0.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
injetavel3
16.7%
artesunato2
 
11.1%
clindam1
 
5.6%
artenetor1
 
5.6%
cloroquin1
 
5.6%
artemeterim+doxiciclina1
 
5.6%
artemether1
 
5.6%
infeccoes1
 
5.6%
e1
 
5.6%
por1
 
5.6%
Other values (5)5
27.8%

Most occurring characters

ValueCountFrequency (%)
E23
14.5%
T17
10.7%
A16
10.1%
N12
 
7.5%
R12
 
7.5%
I11
 
6.9%
O11
 
6.9%
9
 
5.7%
C7
 
4.4%
M7
 
4.4%
Other values (12)34
21.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter149
93.7%
Space Separator9
 
5.7%
Math Symbol1
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E23
15.4%
T17
11.4%
A16
10.7%
N12
8.1%
R12
8.1%
I11
7.4%
O11
7.4%
C7
 
4.7%
M7
 
4.7%
L7
 
4.7%
Other values (10)26
17.4%
Space Separator
ValueCountFrequency (%)
9
100.0%
Math Symbol
ValueCountFrequency (%)
+1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin149
93.7%
Common10
 
6.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
E23
15.4%
T17
11.4%
A16
10.7%
N12
8.1%
R12
8.1%
I11
7.4%
O11
7.4%
C7
 
4.7%
M7
 
4.7%
L7
 
4.7%
Other values (10)26
17.4%
Common
ValueCountFrequency (%)
9
90.0%
+1
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII159
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E23
14.5%
T17
10.7%
A16
10.1%
N12
 
7.5%
R12
 
7.5%
I11
 
6.9%
O11
 
6.9%
9
 
5.7%
C7
 
4.4%
M7
 
4.4%
Other values (12)34
21.4%

DTRATA
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct60
Distinct (%)37.5%
Missing0
Missing (%)0.0%
Memory size10.1 KiB
None
84 
2007-10-10
 
4
2007-07-12
 
3
2007-12-02
 
3
2007-04-03
 
2
Other values (55)
64 

Length

Max length10
Median length4
Mean length6.85
Min length4

Characters and Unicode

Total characters1096
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique46 ?
Unique (%)28.7%

Sample

1st rowNone
2nd rowNone
3rd rowNone
4th rowNone
5th row2007-01-06

Common Values

ValueCountFrequency (%)
None84
52.5%
2007-10-104
 
2.5%
2007-07-123
 
1.9%
2007-12-023
 
1.9%
2007-04-032
 
1.2%
2007-10-262
 
1.2%
2007-12-062
 
1.2%
2007-08-272
 
1.2%
2007-10-032
 
1.2%
2007-04-272
 
1.2%
Other values (50)54
33.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none84
52.5%
2007-10-104
 
2.5%
2007-07-123
 
1.9%
2007-12-023
 
1.9%
2007-04-272
 
1.2%
2007-04-032
 
1.2%
2007-10-262
 
1.2%
2007-12-062
 
1.2%
2007-08-272
 
1.2%
2007-02-222
 
1.2%
Other values (50)54
33.8%

Most occurring characters

ValueCountFrequency (%)
0251
22.9%
-152
13.9%
2123
11.2%
794
 
8.6%
N84
 
7.7%
o84
 
7.7%
n84
 
7.7%
e84
 
7.7%
167
 
6.1%
315
 
1.4%
Other values (5)58
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number608
55.5%
Lowercase Letter252
23.0%
Dash Punctuation152
 
13.9%
Uppercase Letter84
 
7.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0251
41.3%
2123
20.2%
794
 
15.5%
167
 
11.0%
315
 
2.5%
613
 
2.1%
913
 
2.1%
513
 
2.1%
810
 
1.6%
49
 
1.5%
Lowercase Letter
ValueCountFrequency (%)
o84
33.3%
n84
33.3%
e84
33.3%
Uppercase Letter
ValueCountFrequency (%)
N84
100.0%
Dash Punctuation
ValueCountFrequency (%)
-152
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common760
69.3%
Latin336
30.7%

Most frequent character per script

Common
ValueCountFrequency (%)
0251
33.0%
-152
20.0%
2123
16.2%
794
 
12.4%
167
 
8.8%
315
 
2.0%
613
 
1.7%
913
 
1.7%
513
 
1.7%
810
 
1.3%
Latin
ValueCountFrequency (%)
N84
25.0%
o84
25.0%
n84
25.0%
e84
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1096
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0251
22.9%
-152
13.9%
2123
11.2%
794
 
8.6%
N84
 
7.7%
o84
 
7.7%
n84
 
7.7%
e84
 
7.7%
167
 
6.1%
315
 
1.4%
Other values (5)58
 
5.3%

DT_ENCERRA
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing160
Missing (%)100.0%
Memory size1.4 KiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
02B542007-01-2320070420073333045522698052007-01-222007042003-03-254003M6110333304551NaT1021302007-01-231NaNNoneNaT
12B542007-01-2520070420073333045522699882007-01-242007042000-05-164006M6410333301701NaT1121302007-01-251NaNNoneNaT
22B542007-01-1620070320073333045522692952006-12-182006511974-05-064032F9104333303301NaT231NoneNaNNoneNaT
32B542007-01-1520070320073333045531878372007-01-112007021959-03-144047M6105333304551NaT1121302007-01-151NaNNoneNaT
42B542007-01-0320070120073333042022886052006-12-262006521986-08-194020M6333304201NaT2312MA12107352007-01-0342.0212007-01-06NaT
52B542007-01-0320070120073333045522801672007-01-03200701NaT4010M6105333304551NaT4212452007-01-034310.0312007-01-03NaT
62B542007-01-1620070320073333045522801672007-01-152007031981-01-054026M6105333304551NaT11112PA1150140BELE2007-01-164310.0312007-01-16NaT
72B542007-01-1720070320073333024022765342007-01-17200703NaT4021M6109333302401NaT992102007-01-181NaNNoneNaT
82B542007-01-0320070120073333007022782862006-12-182006511973-07-224033M6104333300701NaT0NoneNaNNoneNaT
92B542007-01-2920070520073333045530462812007-01-262007041976-02-064030M61333304551NaT11212RO11100202007-01-294310.0312007-01-29NaT

Last rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
1502B542007-12-1220075020073333045522801672007-12-122007501953-10-024054M6105333304551NaT112121112007-12-124350.0312007-12-12NaT
1512B542007-12-3120080120073333045522801672007-12-292007521984-12-184023M6207333304551NaT101121772007-12-312500.0362007-12-31NaT
1522B542007-12-0220074920073333045522801672007-12-012007481957-06-284050M6106333303401NaT10211RJ13303402007-12-0241.01992007-12-02NaT
1532B542007-12-2920075220073333045522801672007-12-222007511950-09-134057F55333303301NaT99212AM11302602007-12-292200.0262007-12-29NaT
1542B542007-12-0620074920073333045522883382007-12-032007491969-05-044038M6105333301701NaT1121302007-12-06427280.0512007-12-06NaT
1552B542007-12-2720075220073333045522883382007-12-202007511981-12-184026F9104333304551NaT321302007-12-2744880.0412007-12-27NaT
1562B542007-12-0620074920073333045533754712007-12-032007491945-03-154062M6105333304551NaT1121302007-12-062220.0262007-12-06NaT
1572B542007-12-2620075220073333045530059922007-12-212007511991-01-024016F9105333304551NaT102121902007-12-261NaNNoneNaT
1582B542007-12-11200750200733330455632007-12-052007491975-11-064032F9105333304551NaT11212AM1130260SAO2007-12-114310.0212007-12-11NaT
1592B542007-12-1020075020073333020038103482007-12-082007491959-08-184048M6109333302001NaT1131302007-12-101NaNNoneNaT